Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lambda.subst: also update debug event environments #1751

Merged
merged 3 commits into from Jul 19, 2018
Merged

Conversation

trefis
Copy link
Contributor

@trefis trefis commented May 1, 2018

https://caml.inria.fr/mantis/view.php?id=7554

@alainfrisch , @gasche : any opinion on the approach proposed here?

Testing

$ cat pr7554.ml
let () =
  let x, y, z = (List.length [], 3.14, "haha") in
  Printf.printf "%d - %.0f%%) : %S\n"
    x y z
$ ocamlc -g ./pr7554.ml -o pr7554.exe
$ ocamldebug ./pr7554.exe
	OCaml Debugger version 4.08.0+dev0-2018-04-09

(ocd) break @Pr7554 2
Loading program... done.
Breakpoint 1 at 131584: file ./pr7554.ml, line 2, characters 18-32
(ocd) run
Time: 18 - pc: 131584 - module Pr7554
Breakpoint: 1
2   let x, y, z = (List.length []<|a|>, 3.14, "haha") in
(ocd) n
Time: 19 - pc: 131600 - module Pr7554
3   <|b|>Printf.printf "%d - %.0f%%) : %S\n"
(ocd) p x y z

Without the changes proposed here the result is:

Unbound identifier x

With the changes:

x: int = 0
y: float = 3.14
z: string = "haha"

Notes about the implementation

This is a proof of concept.
Things that could/should be tried before merging:

  • it seems sad to go over the lambda twice: once to substitute the subterms, once to update the environments, it's probably possible to do both in a single pass
  • I'm not sure simplify_exits is the only place which introduces new bindings, any other place doing so should probably be updated in the same way.

@gasche
Copy link
Member

gasche commented May 1, 2018

Very naive question: couldn't Lambda.subst just do the right thing on Levent nodes?

@trefis
Copy link
Contributor Author

trefis commented May 2, 2018

As I said:

This is a proof of concept. [...] it seems sad to go over the lambda twice: once to substitute the subterms, once to update the environments, it's probably possible to do both in a single pass

But to answer more precisely: I'm not sure Lambda.subst is the right place. It substitutes idents for arbitrary terms, and it feels a bit weird to add "and also does arbitrary things to the environment of of debug event".
Here we're in a particular case where we just substitute idents. It might make sense to write a specialized version of Lambda.subst that does that plus the environment update.
Or perhaps we could update simplif to update the environment itself during its recursion.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the theory tells us is that a substitution by an environment env goes from an enviromnent E1 to an environment E2 if, for all x \in E1, the term env(x) is well scoped in E2. Then, for any term t that is well-scoped is E1, subst env t is well-scoped in E2. The variable-renaming cases corresponds to the case where env(x) is also a variable, and then has to belong to E2. (In the actual implementation, we only give a fragment of the environment with the notion that variables not in the fragment are unchanged, so we rather go from E,E1 to E,E2 for a common, untouched environment E).

A general way to make subst do the right thing on environments would be to add a parameter of type variable -> value_description -> env -> env, which for a given variable x : vd and result environment E2, extends E2 to contain the environment necessary for env(x). In particular, for a variable->variable renaming the function would just add the renamed binder to the input environment, and for substitutions that turn a variable into a closed term the function would return the input environment unchanged.

In the implementation of Lambda.subst, an auxiliary function of type

(var -> vd -> Env.t -> Env.t) -> _ Ident.Map.t -> summary -> summary

would be handy.

Re. potential specializations, I looked at uses of Lambda.subst, they seem to all be one of the two forms:

  • variable->variable renamings
  • substituting a variable with a closed term (in translclass and translmod),
    or at least a term with free variables present only in the unchanged, common environment
    (neither in the support of the env fragment, nor new variables)

This suggests that having two specialized traversals would also work -- but with more code duplication in Lambda.

@@ -263,7 +263,7 @@ and lambda_event =
{ lev_loc: Location.t;
lev_kind: lambda_event_kind;
lev_repr: int ref option;
lev_env: Env.summary }
lev_env: Env.t }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a wise choice? I thought that Env.summary was designed precisely to be the "compact" part of the environment for use in debug information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where you got that from, but you probably have better sources than I do. I'm also not sure what you're worried about: we still "summarize" these environment when transforming lambda to the next AST, they are never marshalled or anything.
Considering we still have modifications to do on the environment at this point, I think it make more sense to actually modify Env.ts than the summaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One wouldn't think of looking there but I got it from env.mli :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I hadn't looked there.
Then again, depends how you interpret it. Does "compact" refer to the marshalled representation that end up in .cmo files (or similar), or to something else? My bet is on the former, in which case I believe my change is fine.

@trefis trefis changed the title MPR#7554: have simplif refresh debug events environments when introducing new bindings Lambda.subst: also update debug event environments May 4, 2018
@trefis
Copy link
Contributor Author

trefis commented May 4, 2018

I update Lambda.subst to take an extra parameter which it calls to update debug events environments.
The current change is more invasive than the original version and while it seems correct to me, and the test described above still works, I haven't tested it extensively.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your last commit, which is not terribly surprising as it is close to my substitution-and-scoping rants above. I would encourage you to rebase eventually to remove the previous, more hackish approach from the patchset.

I am still not convinced by the Env.summary => Env.t change in lambda. I think of Env.summary as "inspectable data" and Env.t as "opaque stuff", so to me it seems proper to use the former as part of an intermediate representation such as Lambda.

Is there something problematic in writing an auxiliary function that takes an Env.t -> Env.t transformation, and turns it into an Env.summary -> Env.summary transformation? This seems easy to do from the API, is there some efficiency concern?

@@ -590,7 +590,7 @@ let rec make_sequence fn = function
Assumes that the image of the substitution is out of reach
of the bound variables of the lambda-term (no capture). *)

let rec subst s lam =
let rec subst update_env s lam =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_env function remains constant over the traversal, so it doesn't need to be a parameter to inner recursive calls. Could you maybe wrap the mutual recursion under a toplevel let subst update_env s lam = function, so that the patch is less invasive and the code lighter? (You still need an indentation bump.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -334,7 +334,7 @@ val transl_class_path: ?loc:Location.t -> Env.t -> Path.t -> lambda

val make_sequence: ('a -> lambda) -> 'a list -> lambda

val subst: lambda Ident.Map.t -> lambda -> lambda
val subst: (Ident.t -> Types.value_description -> Env.t -> Env.t) -> lambda Ident.Map.t -> lambda -> lambda
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could extend the comment with a word on what the new parameter does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do once we settle on what it should actually do.

)
let update_env idmap oldid vd env =
let newid = Ident.Map.find oldid idmap in
Env.add_value newid vd env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This toplevel definition feels a bit lost, and then the use-sites duplicate idmap-building logic. Why don't you just expose a variant of subst called rename, here or directly in Lambda, that takes a list of (old_id, new_id) pairs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Ident.Map.fold (fun id _ env ->
match Env.find_value (Path.Pident id) evt.lev_env with
| exception Not_found -> env
| vd -> update_env id vd env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in this branch update_env should be called on a version of env where the id binding has been removed. Otherwise there is no way to remove identifiers from the environment when performing a substitution.

@@ -839,21 +852,22 @@ let field_of_str loc str =


let transl_store_structure glob map prims str =
let no_env_update _ _ env = env in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the implementation of substitution on environments is adapted to remove bindings before update_env is called, then this function could/should renamed into remove_from_env, and I believe this would be the correct behavior for the use-sites below.

@trefis
Copy link
Contributor Author

trefis commented May 7, 2018

I am still not convinced by the Env.summary => Env.t change in lambda. I think of Env.summary as "inspectable data" and Env.t as "opaque stuff", so to me it seems proper to use the former as part of an intermediate representation such as Lambda.

And I myself remain completely unconvinced by that argument. Sorry.

Is there something problematic in writing an auxiliary function that takes an Env.t -> Env.t transformation, and turns it into an Env.summary -> Env.summary transformation? This seems easy to do from the API, is there some efficiency concern?

There is no difficulty: Envaux.env_of_only_summary does the conversion in one direction, Env.summary in the other. But I'm indeed worried about the performance, the first of these two functions seems quite costly.

But regardless of the potential cost, I just disagree with your opinion regarding the use of Env.t vs Env.summary. (I'm repeating that point so as to not get dragged into a discussion about the actual impact on performance of these operations)

I think that in this branch update_env should be called on a version of env where the id binding has been removed. Otherwise there is no way to remove identifiers from the environment when performing a substitution.

Not that it should matter, but there is currently no way to remove anything from the environment, ever.
I'm not quite sure we actually always want that behavior (though I suspect you're right), I will think about it tomorrow (i.e. tomorrow I will try and understand what substitutions translmod applies).

@trefis
Copy link
Contributor Author

trefis commented May 8, 2018

To comment on this:

I'm not quite sure we actually always want that behavior

I was wondering about the impact of these substitutions (the ones done in translmod that is) on "debugability", but given that the substitutions are done then I agree a correct semantic would be to remove the bindings from the environments while performing the substitutions.
It's worth noting that these substitutions are done only with the non-flambda native backend (be it normal compilation or from the native toplevel), both the bytecode compiler and the native compiler with flambda enabled introduce local aliases before doing a simple renaming (i.e. Lambda.rename) which seems much better for debugging.


I'm running a bit out of steam on this PR: I thought it would be a quick and easy (and gratifying, and whatever else you can think of!) fix that would distract me for one evening of the things I usually do; I should have known better :p
(Just to be clear: I don't mean to complain about the review you've done on this Gabriel, I think that thanks to your comments the PR is in a better state that it initially was. I just have other things to do at the moment)

Here is I believe, a summary of the state of this PR:

  1. it seems that we disagree about the change putting Env.ts instead of Env.summarys inside lambda_events.
  2. Lambda.subst should remove values from the environment before calling its update_env parameter. However:
    • the fact that it doesn't is AFAICT not observable from any of the debuggers currently available (that might not be true if something like Make OCaml native debugging awesome #574 ever lands, but I'm not really holding my breath).
    • there is currently no helper to remove a binding from the environment. Writing one seems reasonably easy (though quite ad-hoc), but might have a negative impact on compilation time.

While I feel guilty for saying that, I don't think point 2 is a blocker and I feel like we could merge even without handling it (we're not introducing a regression, we're "only" half-fixing the issue).
Point 1 seems like it might be a blocker for you, but unless another person/argument appears to change my mind, I don't think I'll do any work on it.

So it seems that we have two options here (if we don't want this PR to rot away for months):

  • if you care enough about these issues, and have enough time to work on them, then I can close this PR and you pick up from here and do the changes you proposed as well as some benchmarking to confirm we're not degrading compilation time too much
  • or I rebase this on a recent trunk, clean up the commit history, and we merge it

I'd be happy with either of these two options (and somewhat sad if neither was taken...)

What do you think?

@gasche
Copy link
Member

gasche commented May 8, 2018

I think that your proposals for both (1) and (2) are reasonable and pragmatic. Re. (1), I agree with your point about efficiency concerns, and with the idea that consumers could always perform summarization locally.

(We could think of adding a type parameter to Lambda.t that is either Env.t or Env.summary, to keep exporting summaries to the outside but have environments for internal transformations. But the change to all lambda-using interfaces may make the curse worse than the disease.)

I would find it reassuring to have a comment from someone that knows when and how summarization really matters.)

@trefis
Copy link
Contributor Author

trefis commented May 8, 2018

I would find it reassuring to have a comment from someone that knows when and how summarization really matters

My intuition is that we need to summarize when we serialize (which happens for debug events, but also for cmts).
This might have initially been because of size concerns (again, my intuition would be that the summary is more compact than the whole env) and is currently mandatory because of the environment contains functional values which can't be marshaled.

I somewhat doubt that there ever was another reason, but I'm only guessing.

@trefis
Copy link
Contributor Author

trefis commented May 14, 2018

I have rebased and cleaned up the history.

Given that there has been no comment for the past week regarding the Env.t vs Env.summary discussion, and that we can always revisit that particular change later if someone wakes up to shed some light on the topic, I am tempted to merge this in trunk sometime in the next few days.

@gasche
Copy link
Member

gasche commented May 14, 2018

@xavierleroy: is there any reason not to change the payload of Levent from Env.summary to Env.t, as long as we convert back to summary before any serialization, in particular when producing bytecode?

(To summarize the discussion above: the reason to use Env.t instead of Env.summary is to be able to replace bindings when performing subtitutions, which would take linear time on summaries.)

@damiendoligez
Copy link
Member

I dug up an old discussion on caml-devel (from 2012!) about environment summaries. The gist is that summaries are used in debugger events (in byte code executables) and .cmt files instead of full environments, in order to keep the files reasonably small.

The main protagonists were @lefessan and @xavierleroy. Any opinions?

@xavierleroy
Copy link
Contributor

Summaries (Env.summary) are much more compact than environments (Env.t), by factors of 100 or more. That's why the original code drops Env.t in favor of Env.summary as soon as the full Env.t are no longer needed, i.e. when leaving the typechecker, keeping in Lambda and below the strict minimum Env.summary needed for debugging information. It does make me nervous to keep Env.t live longer than that.

@trefis
Copy link
Contributor Author

trefis commented May 17, 2018

Thanks for the context!

I'm still in favor of keeping the change from summaries to full env in lambda, for the same reasons as previously. That is: Env.t still do not make it to build artifacts, and we're still modifying the environment at that point, so it makes more sense (and is more efficient) to work on the actual environment.

I had a look to see if that last argument could be used to justify pushing Env.t all the way through to the bytecode, and I do not believe so: there are a bunch of calls to Ident.create and Ident.rename in Transl*, Matching and Simplif, but there doesn't seem to be anything after that.
(well, except when we rename globals when building a packed module, but I don't really know what to make of that)

@gasche
Copy link
Member

gasche commented May 21, 2018

After thinking more about the issue, I have come to agree with @trefis here: having Env.summary in lambda trees means that we cannot implement substitution correctly over lambda, and that is a highly undesirable property for an intermediate representation.

It is important that summary is used for anything serialized, and the current PR ensures this. The concern over compactness also comes from memory usage during compilation. Maybe this could be alleviated by some compilation performance testing? If no important degradation is found, I would be in favor of the PR as it is today.

If we wanted to improve on this, several choices:

  • What makes Env.t non-compact is partly the arborescent/mapping structure, and partly due to information that made sense during type-checking but shouldn't be useful after that (callbacks for usage detection and unused warnings, etc.). It could be possible to have a new intermediate representation of environment that keeps the mapping structure, but keeps only the data that is useful for a now-fully-typed representation.
  • As a simpler thing, maybe there is a way to process an Env.t structure to remove the data justified for type inference in a systematic way, without changing the type. (Putting opened = None in IdTbl tables, for example). This is not very nice.
  • As an even simpler thing, it may be possible to change the representation of events to delay term substitutions, storing a pair of a substitution and a summary. Computing the actual environment could be forced when converting out of lambda form, so that the linear cost of the summary -> env transformation is paid only once.

@lpw25
Copy link
Contributor

lpw25 commented May 21, 2018

As an even simpler thing, it may be possible to change the representation of events to delay term substitutions, storing a pair of a substitution and a summary.

Slight variation: you could add a substitution constructor to the Env.summary type and just perform the substitution if/when it is converted back to an ordinary Env.t.

@trefis
Copy link
Contributor Author

trefis commented Jun 5, 2018

I'd still rather keep Env.ts in debug events in lambda, but I'd be OK with Leo's suggestion.

That said, I'm too shy to take a decision on this, so I welcome any opinion (one way or the other).

Introduced "Lambda.rename" for var->var substitutions, this also updates the
debug envs automatically.

This fixes MPR#7554
@trefis
Copy link
Contributor Author

trefis commented Jul 17, 2018

Apparently given enough time, shyness goes away: I rebased this morning, the CIs appear to pass, I'll merge as is later today.

@trefis trefis merged commit f280343 into ocaml:trunk Jul 19, 2018
@trefis trefis deleted the pr7554 branch July 19, 2018 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants